08. Gradient Boosting

Gradient Boosting

In our exercise about overfitting, we're going to use a type of model that we haven't yet encountered in the course, but that's popular and well-known, and has been used successfully in machine learning competitions: gradient boosted trees. Here we're going to give you a short introduction to gradient boosting so that you have an intuition for how the model works.

We've already studied ensembling; well, boosting is another type of ensembling, or combining weak learners into a strong learner. It's also typically done with decision trees as the weak learners. The video below will give you a quick introduction to boosting by telling you about the first successful boosting algorithm, Adaboost.

MLND SL EM 03 AdaBoost V1 MAIN V1

This video only scratched the surface, but you can see that Adaboost basically works in the following way:

  • Fit an additive model (ensemble) in a forward stage-wise manner.
  • In each stage, introduce a weak learner to compensate the shortcomings of existing weak learners.
  • In Adaboost,“shortcomings” are identified by high-weight datapoints (this is what is meant in the video by making the points "bigger").

Gradient boosting is very similar. In essence, it allows the user to customize the method used to identify the "shortcomings" of existing weak learners (the cost function ). If you want to learn more about the details, try this page .